























10

Another way to look at it...

CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)



# Questions about control signals

- $\boldsymbol{\cdot}$  Following discussion relevant to a single instruction
- Q: Are all control signals active at the same time?
- A: ?
- Q: Can we generate all these signals at the same time?
- A: ?

University of Notre Dame, Department of Computer Science & Engineering



# 26E 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) **Pipelined datapath w/control signals** $u = \int_{Read} \int_{R$

University of Notre Dame, Department of Computer Science & Engineering

# Hazards (Let's start on the chalkboard)

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# How do we deal with hazards?

- Often, pipeline must be stalled
- Stalling pipeline usually lets some instruction(s) in pipeline proceed, another/others wait for data, resource, etc.
- A note on terminology:
  - If we say an instruction was "issued <u>later</u> than instruction x", we mean that <u>it was issued after</u> <u>instruction x</u> and is not as far along in the pipeline
  - If we say an instruction was "issued <u>earlier</u> than instruction ×", we mean that it <u>was issued before</u> <u>instruction</u> <u>x</u> and is further along in the pipeline

# The hazards of pipelining

- Pipeline hazards prevent next instruction from executing during designated clock cycle
- There are 3 classes of hazards:
  - Structural Hazards:
    - Arise from resource conflicts
    - HW cannot support all possible combinations of instructions
  - Data Hazards:
    - Occur when given instruction depends on data from an instruction ahead of it in pipeline
  - Control Hazards:
    - Result from branch, other instructions that change flow of program (i.e. change PC)

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

### Stalls and performance

- Stalls impede progress of a pipeline and result in deviation from 1 instruction executing/clock cycle
- Pipelining can be viewed to:
  - Decrease CPI or clock cycle time for instruction
  - Let's see what affect stalls have on CPI...
- · CPI pipelined =
  - Ideal CPI + Pipeline stall cycles per instruction
  - 1 + Pipeline stall cycles per instruction
- Ignoring overhead and assuming stages are balanced:

| Speedup = | CPI unpipelined                      | (Reca |
|-----------|--------------------------------------|-------|
| Speedup = | 1+ pipeline stall cycles per instruc | ction |



# Structural hazards

- 1 way to avoid structural hazards is to duplicate resources
  - i.e.: An ALU to perform an arithmetic operation and an adder to increment PC
- If not all possible combinations of instructions can be executed, structural hazards occur
- Most common instances of structural hazards:
  - When a functional unit not fully pipelined
  - When some resource not duplicated enough
- Pipelines stall result of hazards, CPI increased from the usual "1"

University of Notre Dame, Department of Computer Science & Engineering



University of Notre Dame, Department of Computer Science & Engineering



CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

#### University of Notre Dame, Department of Computer Science & Engineering

# CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) 24 Or alternatively...

|           | •  |    |    | —— Cloo | ck Numb | er — |    |     |     |     |
|-----------|----|----|----|---------|---------|------|----|-----|-----|-----|
| Inst. #   | 1  | 2  | 3  | 4       | 5       | 6    | 7  | 8   | 9   | 10  |
| LOAD      | IF | ID | EX | MEM     | WB      |      |    |     |     |     |
| Inst. i+1 |    | IF | ID | EX      | MEM     | WB   |    |     |     |     |
| Inst. i+2 |    |    | IF | ID      | EX      | MEM  | WB |     |     |     |
| Inst. i+3 |    |    |    | stall   | IF      | ID   | EX | MEM | WB  |     |
| Inst. i+4 |    |    |    |         |         | IF   | ID | EX  | MEM | WB  |
| Inst. i+5 |    |    |    |         |         |      | IF | ID  | EX  | MEM |
| Inst. i+6 |    |    |    |         |         |      |    | IF  | ID  | EX  |

- LOAD instruction "steals" an instruction fetch cycle which will cause the pipeline to stall.
- Thus, no instruction completes on clock cycle 8

University of Notre Dame, Department of Computer Science & Engineering



# Remember the common case!

- All things being equal, a machine without structural hazards will always have a lower CPI.
- But, in some cases it may be better to allow them than to eliminate them.
- These are situations a computer architect might have to consider:
  - Is pipelining functional units or duplicating them costly in terms of HW?
  - Does structural hazard occur often?
  - What's the common case???

University of Notre Dame, Department of Computer Science & Engineering



University of Notre Dame, Department of Computer Science & Engineering

This is a data hazard

University of Notre Dame, Department of Computer Science & Engineering



Forwarding
Problem illustrated on previous slide can actually be solved relatively easily - with forwarding
In this example, result of the ADD instruction not really needed until after ADD actually produces it
Can we move the result from EX/MEM register to the beginning of ALU (where SUB needs it)?

Yes! Hence this slide!

Generally speaking:

Forwarding occurs when a result is passed directly to functional unit that requires it.
Result goes from output of one unit to input of another

CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

30



# CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) 32 HW Change for Forwarding ID/EX EX/MEM MEM/WB ID/EX EX/MEM MEM/WB





35

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

Memory Data Hazards

- Seen register hazards, can also have  $\underline{\text{memory }} \underline{\text{hazards}}$ 
  - RAW:
    - store R1, 0(SP)
    - load R4, 0(SP)

|                 | 1 | 2 | 3  | 4  | 5  | 6  |
|-----------------|---|---|----|----|----|----|
| Store R1, O(SP) | F | D | EX | Μ  | WB |    |
| Load R1, O(SP)  |   | F | D  | EX | M  | WB |

- In simple pipeline, memory hazards are easy
  - In order, one at a time, read & write in same stage
- In general though, more difficult than register hazards

CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) 36 2 3 5 6 Store R1, O(SP) D EX M WB F Load R1, O(SP) D EX M WB



Can't get data to subtract instruction unless...

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# Data hazards and the compiler

- Compiler should be able to help eliminate some stalls caused by data hazards
- i.e. compiler could not generate a LOAD instruction that is immediately followed by instruction that uses result of LOAD's destination register.
- Technique is called "pipeline/instruction scheduling"



#### CSE 30321 – Lecture 20-21 – Pipelining (Hazards & Examples)

40

#### What about control logic?

- For MIPS integer pipeline, all data hazards can be checked during ID phase of pipeline
- If data hazard, instruction stalled before its issued
- Whether forwarding is needed can also be determined at this stage, controls signals set
- If hazard detected, control unit of pipeline must stall pipeline and prevent instructions in IF, ID from advancing
- All control information carried along in pipeline registers so only these fields must be changed

CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# Some example situations

| Situation                            | Example                                                            | Action                                                                                                                                                                |
|--------------------------------------|--------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| No Dependence                        | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R6, R7<br>OR R9, R6, R7 | No hazard possible because no dependence<br>exists on R1 in the immediately following<br>three instructions.                                                          |
| Dependence requiring<br>stall        | LW R1, 45(R2)<br>ADD R5, R1, R7<br>SUB R8, R6, R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in the<br>ADD and stall the ADD (and SUB and OR)<br>before the ADD begins EX                                                         |
| Dependence overcome<br>by forwarding | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R1, R7<br>OR R9, R6, R7 | Comparators detect the use of R1 in SUB<br>and forward the result of LOAD to the ALU<br>in time for SUB to begin with EX                                              |
| Dependence with<br>accesses in order | LW R1, 45(R2)<br>ADD R5, R6, R7<br>SUB R8, R6, R7<br>OR R9, R1, R7 | No action is required because the read of<br>R1 by OR occurs in the second half of the<br>ID phase, while the write of the loaded<br>data occurred in the first half. |

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

43

# Hazard Detection Logic

- Insert a bubble into pipeline if any are true:
  - ID/EX.RegWrite AND
    - ((ID/EX.RegDst=0 AND ID/EX.WriteRegRt=IF/ID.ReadRegRs) OR
    - (ID/EX.RegDst=1 AND ID/EX.WriteRegRd=IF/ID.ReadRegRs) OR
    - (ID/EX.RegDst=0 AND ID/EX.WriteRegRt=IF/ID.ReadRegRt) OR
    - (ID/EX.RegDst=1 AND ID/EX.WriteRegRd=IF/ID.ReadRegRt))

#### - OR EX/MEM AND

- ((EX/MEM.WriteReg = IF/ID.ReadRegRs) OR
- (EX/MEM.WriteReg = IF/ID.ReadRegRt))
- OR MEM/WB.RegWrite AND
  - ((MEM/WB.WriteReg = IF/ID.ReadRegRs) OR
  - (MEM/WB.WriteReg = IF/ID.ReadRegRt))

 Pipeline
 Notation

 Register
 ID/EX.RegDst

University of Notre Dame, Department of Computer Science & Engineering

# Detecting Data Hazards



#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# RAW: Detect and Stall

- · detect RAW & stall instruction at ID before register read
  - mechanics? disable PC, F/D write
  - RAW detection? compare register names
    - notation: rs1(D) = src register #1 of inst. in D stage
    - compare: rs1(D) & rs2(D) w/ rd(D/X), rd(X/M), rd(M/W)
    - stall (disable PC + F/D, clear D/X) on any match
  - RAW detection? register busy-bits
    - $\cdot$  set for rd(D/X) when instruction passes ID
    - clear for rd(M/W)
    - stall if rs1(D) or rs2(D) are "busy"
  - (plus) low cost, simple
  - (minus) low performance (many stalls)

42



University of Notre Dame, Department of Computer Science & Engineering



CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

#### Branch signal determined in MEM stage





| CSE 3                 | 0321 - Lecture 20- | 21 – Pipelining (Hazar       | rds & Examples)               | 50         |
|-----------------------|--------------------|------------------------------|-------------------------------|------------|
| Dealing w             | /branch            | hazards:                     | always                        | stall      |
| • Branch not to       | iken               |                              |                               |            |
| - Still must v        | vait 3 cycles      |                              |                               |            |
| - Time lost           |                    |                              |                               |            |
| - Could have          | spent cycles fe    | tching and decod             | ing next instruc <sup>.</sup> | tions      |
| clock cycle.          | · · · · · · · · ·  | 4 CC 5 CC 6 CC 7             | CC8 CC9 CC10 C                | C 11 CC 12 |
| 40 beq \$1, \$3, \$28 |                    | Reg                          |                               |            |
| stall                 | CUP (COPPER CON    | be oubble oubbe              |                               |            |
| stall                 | MH 🤃               | bile (ouble) (ouble) (ouble) |                               |            |
| stall                 | IM                 | H Comple Comple Comple (     | bubble                        |            |
| 44 and \$12, \$2, \$5 |                    |                              | DM Beg                        |            |
| 48 or \$13, \$6, \$2  |                    |                              |                               |            |
| 52 add \$14, \$2, \$2 |                    | <b>IM</b> – –                |                               | eg:        |
|                       |                    |                              |                               |            |

University of Notre Dame, Department of Computer Science & Engineering



#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

52

#### Flushing unwanted instructions from pipeline

- Useful to compare w/stalling pipeline:
  - Simple stall: inject bubble into pipe at ID stage only
    - Change control to 0 in the ID stage
    - Let "bubbles" percolate to the right
  - Flushing pipe: must change inst. In IF, ID, and EX • IF Stage:
    - Zero instruction field of IF/ID pipeline register
    - Use new control signal IF.Flush
    - ID Stage:
      - Use existing "bubble injection" mux that zeros control for stalls
      - Signal ID.Flush is ORed w/stall signal from hazard detection unit
    - EX Stage:
      - Add new muxes to zero EX pipeline register control lines
      - Both muxes controlled by single EX.Flush signal
- Control determines when to flush:
  - Depends on Opcode and value of branch condition





#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# Branch Penalty Impact

- Assume 16% of all instructions are branches
  - 4% unconditional branches: 3 cycle penalty
  - 12% conditional: 50% taken
- For a sequence of N instructions (assume N is large)
  - N cycles to initiate each
  - 3 \* 0.04 \* N delays due to unconditional branches
  - 0.5 \* 3 \* 0.12 \* N delays due to conditional taken
  - Also, an extra 4 cycles for pipeline to empty
- Total:
  - 1.3\*N + 4 total cycles (or 1.3 cycles/instruction) (CPI)
     · 30% Performance Hit!!! (Bad thing)

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

56

#### **Branch Penalty Impact**

#### • Some solutions:

- In ISA: branches always execute next 1 or 2 instructions
  - Instruction so executed said to be in delay slot
  - · See SPARC ISA
  - (example loop counter update)
- In organization: move comparator to ID stage and decide in the ID stage
  - Reduces branch delay by 2 cycles
  - Increases the cycle time



57

59

# **Branch Prediction**

- Prior solutions are "ugly"
- Better (& more common): guess in IF stage
  - Technique is called "branch predicting"; needs 2 parts:
    - "Predictor" to guess where/if instruction will branch (and to where)
    - "Recovery Mechanism": i.e. a way to fix your mistake
  - Prior strategy:
    - Predictor: always guess branch never taken
    - Recovery: flush instructions if branch taken
  - Alternative: accumulate info. in IF stage as to...
    - Whether or not for any particular PC value a branch was taken next
    - To where it is taken
    - How to update with information from later stages

University of Notre Dame, Department of Computer Science & Engineering

#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

# Computing Performance

- Program assumptions:
  - 23% loads and in  $\frac{1}{2}$  of cases, next instruction uses load value
  - 13% stores
  - 19% conditional branches
  - 2% unconditional branches
  - 43% other
- Machine Assumptions:
  - 5 stage pipe with all forwarding
    - Only penalty is 1 cycle on use of load value immediately after a load)
    - Jumps are totally resolved in ID stage for a 1 cycle branch penalty
    - 75% branch prediction accuracy
    - 1 cycle delay on misprediction



CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

**A Branch Predictor** 

#### University of Notre Dame, Department of Computer Science & Engineering

| CSE 30321 – Lecture 20-21 – Pipelining (Hazards & Examples)            | 60 |
|------------------------------------------------------------------------|----|
| The Answer:                                                            |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
|                                                                        |    |
| University of Notre Dame, Department of Computer Science & Engineering |    |

| CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) 61         | CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)                       |  |  |  |  |
|------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--|--|--|--|
|                                                                        | Exception Hazards                                                                 |  |  |  |  |
|                                                                        | • 40 <sub>hex</sub> : sub \$11, \$2, \$4                                          |  |  |  |  |
|                                                                        | • 44 <sub>hex</sub> : and \$12, \$2, \$5                                          |  |  |  |  |
|                                                                        | • 48 <sub>hex</sub> : or \$13, \$6, \$2                                           |  |  |  |  |
|                                                                        | <ul> <li>4b<sub>hex</sub>: add \$1, \$2, \$1 (overflow in EX stage)</li> </ul>    |  |  |  |  |
| Lots more examples                                                     | • 50 <sub>hex</sub> : slt \$15, \$6, \$7 (already in ID stage)                    |  |  |  |  |
| •                                                                      | • 54 <sub>hex</sub> : lw \$16, 50(\$7) (already in IF stage)                      |  |  |  |  |
| (handout)                                                              | •                                                                                 |  |  |  |  |
|                                                                        | <ul> <li>40000040<sub>hex</sub>: sw \$25, 1000(\$0) exception handler</li> </ul>  |  |  |  |  |
|                                                                        | • 40000044 <sub>hex</sub> : sw \$26, 1004(\$0)                                    |  |  |  |  |
|                                                                        | <ul> <li>Need to transfer control to exception handler ASAP</li> </ul>            |  |  |  |  |
|                                                                        | - Don't want invalid data to contaminate registers or memory                      |  |  |  |  |
|                                                                        | - Need to flush instructions already in the pipeline                              |  |  |  |  |
|                                                                        | <ul> <li>Start fetching instructions from 40000040<sub>hex</sub></li> </ul>       |  |  |  |  |
|                                                                        | - Save addr. following offending instruction (50 <sub>hex</sub> ) in TrapPC (EPC) |  |  |  |  |
|                                                                        | - Don't clobber \$1 - use for debugging                                           |  |  |  |  |
| University of Notre Dame, Department of Computer Science & Engineering | University of Notre Dame, Department of Computer Science & Engineering            |  |  |  |  |



#### CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples)

64

#### Managing exception hazards gets much worse!

• Different exception types may occur in different stages:

| Exception Cause       | Where it occurs   |
|-----------------------|-------------------|
| Undefined instruction | ID                |
| Invoking OS           | EX                |
| I/O device request    | Flexible          |
| Hardware malfunction  | Anywhere/flexible |

- Challenge is to associate exception with proper instruction: difficult!
  - Relax this requirement in non-critical cases: imprecise exceptions
    - Most machines use precise instructions
  - Further challenge: exceptions can happen at same time University of Notre Dame, Department of Computer Science & Engineering



| CSE 30321 - Lecture 20-21 - Pipelining (Hazards & Examples) | 67 |
|-------------------------------------------------------------|----|
| Summary                                                     |    |
| · Performance:                                              |    |
| <ul> <li>Execution time *or* throughput</li> </ul>          |    |
| - Amdahl's law                                              |    |
| <ul> <li>Multi-bus/multi-unit circuits</li> </ul>           |    |
| - one long clock cycle or N shorter cycles                  |    |
| • Pipelining                                                |    |
| - overlap independent tasks                                 |    |
| <ul> <li>Pipelining in processors</li> </ul>                |    |
| - "hazards" limit opportunities for overlap                 |    |
|                                                             |    |
|                                                             |    |
|                                                             |    |